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CLAIMS 

What is claimed is: 

1 . A system that facilitates enhancement of a speech signal comprising: 

an input component that receives a speech signal and pixel-based image data 
relating to an originator of the speech signal; and, 

a speech enhancement component that employs a probabilistic-based model that 
correlates between the speech signal and the image data so as to facilitate discrimination 
of noise from the speech signal, the model employing a set of hidden variables 
representing relevant features, the features being inferred from at least one of the speech 
signal and pixel-based image data. 

2. The system of claim 1, the probabilistic-based model comprising an audio model, 
the audio model based, at least in part, upon: 

p(u\s) = HN{u k \0 9 <*k). 
k 

p(s) = n s 

p(w\u) = Y[N(w k \hu k ,<f> k ) 

k 

where is a clean speech signal, 

Wk is the speech signal, 

s is a state variable of the speech signal, and, 

the notation N(x \ ju, cr) denotes a Gaussian distribution over random variable 
x with mean ju and inverse covariance a 

3. The system of claim 1, the probabilistic-based model comprising a video model, 
the video model based, at least in part, upon: 

p{l) = const. 
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p{v\r) = Y[N{v i \Y j A ij r j+Mi ,v i ) 

1 j 

p(y\v,i) = IJ^.Ivw.A) 

i 

where y is the pixel-based image, 
r is a hidden variable, 

A is a matrix of weights for the hidden variables r, 

/ is a location parameter, 

v is a hidden clean pixel-based image, 

vu is shorthand for (x,- -*/), 

is the position of the i th pixel, 
jc/ is the position represented by /, and , 
<ffc) is the index of v corresponding to 2D position x. 

4. The system of claim 1, the probabilistic-based model comprising an audio/video 
model, the audio/video model based, at least in part, upon: 

p{r\s) - Y[N{ rj \ij.,y, sj ) 

, . j 

where r is a hidden variable, 

s is a state variable of the speech signal, 

^is a precision matrix parameter associated with s, and, 

77 is a precision matrix parameter associated with s. 

5. The system of claim 1, modification of at least one parameter of the probabilistic 
model being based upon a variational expectation maximization algorithm having an E- 
step and an M-step. 

6. The system of claim 5, the expectation maximization algorithm being based, at 
least in part, the equation: 
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p(u 9 s 9 r,v\ y 9 w) * q(u \ s)q(s)q(r \ s)q(v\ r,l)q(l) 



where u is a clean speech signal, 

s is a state variable of the speech signal, 



r is a hidden variable, 

v is a hidden clean pixel-based image, 

y is the pixel-based image, 

w is the speech signal, and, 

/ is a location parameter. 

7. The system of claim 5, the expectation maximization algorithm being based, at 
least in part, the equation: 



h = 



X*A<£KI 2 > 



l 



(\w k \ 2 )-2hRe(w k Eu^ + (E\u k | 2 ) 



k 



} 



where 



Eu k 



'sk 



S 




and, 



Ufc is a clean speech signal, 

w k is the speech signal/ 

tv s is a prior probability parameter of s, 

a s k is an inverse covariance, and, 
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8. The system of claim 7, the expectation maximization algorithm being based, at 
least in part, the equation: 

A = (Evr T - EvEr T ) (Err T - ErEr T )~ l 
// = (Ev - AEr) 

v x =■ Diag(Evv T - AErv T - juEv T ) 
where "Diag" refers to the diagonal of the matrix, and, 
Er = Yj^sVs 

S ' 

Err 7 = Y^TtfanJ +WI X ) 

S • 

Ev = Y,*s&Ws+v) 

S 

Evr 1 = ^s[{M+Ji)n s T + A ~Vs\ 

s 

* • 

9. The system of claim 8, the expectation maximization algorithm being based, at 
least in part, the equation: 

%■ = (Wsj) 
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10. The system of claim 1, the image data comprising information associated with an 
appearance of the lips of the originator of the speech signal. 

1 1 . The system of claim 1, wherein the speech component tracks the lips of the 
originator of the speech signal in order to facilitate discrimination of noise from the 
speech signal. 

12. The system of claim 1, the input component further comprising a frequency 
transformation component that receives windowed signal inputs, computes a frequency 
transform of the windowed signals, and provides outputs of frequency transformed 
windowed signals to the speech enhancement component. 

13. The system of claim 12, further comprising a windowing component that applies 
an N-point window to the speech signal and provides the windowed signal inputs to the 
frequency transformation component. 

14. The system of claim 1, further comprising at least two audio input devices that 
provide speech signals. 

15. The system of claim 1, the probabilistic-based model being trained, at least in 
part, during operation of the system. 

16. The system of claim 1, the features comprising at least one of a speech state and 
lip motion. 

17. The system of claim 1, wherein the model incorporates an additional degree of 
freedom that models image translation, 

18. A method facilitating enhancement of a speech signal comprising: 
receiving a speech signal; 
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receiving a pixel-based image data relating to an originator of the speech signal; 

and, 

generating an enhanced speech signal based, at least in part, upon a probabilistic- 
based model that correlates between the speech signal and the image data so as to 
facilitate discrimination of noise from the speech signal. 

19. The method of claim 18 further comprising providing an output associated with 
the enhanced, speech signal. . 

20. A data packet transmitted between two or more computer components that 
facilitates enhancement of a speech signal, the data packet comprising: 

an enhanced speech signal, the enhanced speech signal being based, at least in 
part, upon a probabilistic-based model that correlates between speech signal and image 
data related to an originator of the speech signal so as to facilitate discrimination of noise 
from the speech signal. 

21 . A computer readable medium storing computer executable components of a 
system that facilitates enhancement of a speech signal comprising, comprising: 

an input component that receives a speech signal and pixel-based image data 
relating to an originator of the speech signal; and, 

an speech enhancement component that employs a probabilistic-based model that 
correlates between the speech signal and the image data so as to facilitate discrimination 
of noise from the speech signal. 

22. A system that facilitates enhancement of a speech signal comprising: 
means for receiving a speech signal and pixel-based image data relating to an 

originator of the speech signal; and, 

means for enhancing the speech signal, the means for enhancing employing a 
probabilistic-based model that correlates between the speech signal and the image data so 
as to facilitate discrimination of noise from the speech signal. 
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